Deep Extreme Multi-label Learning

نویسندگان

Wenjie Zhang

Liwei Wang

Junchi Yan

Xiangfeng Wang

Hongyuan Zha

چکیده

Extreme multi-label learning (XML) or classification has been a practical and important problem since the boom of big data. The main challenge lies in the exponential label space which involves 2 possible label sets when the label dimension L is very large, e.g., in millions for Wikipedia labels. This paper is motivated to better explore the label space by building and modeling an explicit label graph. In the meanwhile, deep learning has been widely studied and used in various classification problems including multi-label classification, however it has not been sufficiently studied in this extreme but practical case, where the label space can be as large as in millions. In this paper, we propose a practical deep embedding method for extreme multi-label classification. Our method harvests the ideas of non-linear embedding and modeling label space with graph priors at the same time. Extensive experiments on public datasets for XML show that our method performs competitively against state-of-the-art result. Introduction In the field of machine learning, eXtreme Multi-label Learning (XML) addresses the problem of learning a classifier that can automatically tag a data sample with the most relevant subset of labels from a large label set. For instance, there are more than a million labels (i.e. categories) on Wikipedia and one may wish to build a classifier that can annotate a new article or web page with a subset of relevant Wikipedia categories. Extreme multi-label learning or specifically classification, is a very challenging research problem for the need to simultaneously deal with massive labels, dimensions and training points. Compared with traditional multi-label classification methods (Tsoumakas and Katakis 2006), extreme multi-label classification methods focus on tackling the problem of extremely high input dimensions for both input feature dimension and label dimension. It should also be emphasized that multi-label learning is distinct from multiclass classification (Wu, Lin, and Weng 2004) whose aim is to predict a single mutually exclusive label. In contrast, XML allows for the co-existence of more than one labels for a single data sample. One straightforward method for classification is to train an independent one-against-all classifier for each label, which is clearly not the optimal for multi-label classification as the dependencies among class labels can not be leveraged. Furthermore, in the context of extreme multi-label classification, this is not feasible since it will be almost computationally intractable to train a massive number of e.g. one million classifiers. The issue could be ameliorated if a label hierarchy was provided. However, such a hierarchy is often unavailable in many applications (Bhatia et al. 2015). The pain also lies in the prediction stage, where all the classifiers need to be evaluated for each testing data sample. To address these challenges, state-of-the-art extreme multi-label classification methods have been proposed recently, which in general can be divided into two categories: tree based methods and embedding based methods. Tree based methods (Weston, Makadia, and Yee 2013; Agrawal et al. 2013; Prabhu and Varma 2014) have become popular as they enjoy notable accuracy improvement over traditional embedding methods. The idea is to learn a hierarchy from the training data. The root is initialized to contain the whole label set. A node partition formulation is then optimized to determine which labels should be assigned to the left child or to the right. Nodes are recursively partitioned till each leaf contains a small number of labels. In the prediction stage, a testing data sample is passed down the tree until it arrives at the leaf nodes whereby its predicted labels are finally determined. Another important line of research is the embedding based method (Hsu et al. 2009; Zhang and Schneider 2011; Tai and Lin 2012; Balasubramanian and Lebanon 2012; Bi and Kwok 2013; Cisse et al. 2013). These approaches attempt to make training and prediction tractable by assuming that the training label matrix (of which each column/row corresponds a training sample’s label vector) is low-rank and reducing the effective number of labels by projecting the high dimensional label vectors onto a low dimension linear subspace. While for prediction, labels for a novel sample are predicted by post-processing where a decompression matrix lifts the embedded label vectors back to the original extremely high dimensional label space. However, the fundamental problem of the embedding method is the low-rank label space assumption, which is violated in most real world applications (Bhatia et al. 2015). In general, traditional embedding methods are more efficient for computing than tree based method at the expense of lower accuracy. Notably, the state-of-the-art embedding based method SLEEC (Sparse Local Embeddings for Extreme Multi-label ar X iv :1 70 4. 03 71 8v 3 [ cs .L G ] 1 9 O ct 2 01 7 Classification) (Bhatia et al. 2015) achieves significant accuracy gain while still being computationally economical, which is attributed to the non-linearity modeled through neighbourhood preserving constraints. This inspires us to take the step further along this direction. It is intriguing to take a deep neural network approach for non-linearity modeling. However, to the best of our knowledge, there is very few prior art on deep learning for XML. Perhaps more importantly, we make the observation that the label structure is also very important in those tree based methods where the node is each label. However, they seem to be totally ignored in state-of-the-art embedding methods. For instance, SLEEC focuses on dimension reduction on the raw label matrix (of which each column/row corresponds a training sample’s label vector) rather than modeling of the label graph structure as mentioned above. In this paper, we propose a deep learning based approach for the extreme multi-label learning. We extend traditional deep learning framework for multi-label classification with millions of labels via building non-linear embedding for both feature and label space. As far as we know, our method is the first that models the feature space non-linearity and label graph structure at the same time in solving this problem. Contribution In a nutshell, the contributions are: • To the best of our knowledge, this is the first work for explicit label graph structure modeling for the extreme multi-label learning. Note the label hierarchy explored by the tree based methods is different from label graph. • This paper is also the first embedding based model with a deep neural network for XML – see Fig.1. In fact, no prior art is identified for exploring deep learning neither for feature space reduction nor for label space reduction in the XML setting. • Extensive experiments on various public benchmark show that our method consistently produces the best or the second best results without ensemble. The paper is organized as follows. Section 2 introduces the related work. The details of the above mentioned concept and idea will be detailed in Section 3. Section 4 presents the main empirical results and Section 5 concludes this paper.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-view, Multi-label Learning with Deep Neural Networks

Deep learning is a popular technique in modern online and offline services. Deep neural network based learning systems have made groundbreaking progress in model size, training and inference speed, and expressive power in recent years, but to tailor the model to specific problems and exploit data and problem structures is still an ongoing research topic. We look into two types of deep ‘‘multi-’...

متن کامل

A New Method for Detecting Ships in Low Size and Low Contrast Marine Images: Using Deep Stacked Extreme Learning Machines

Detecting ships in marine images is an essential problem in maritime surveillance systems. Although several types of deep neural networks have almost ubiquitously used for this purpose, but the performance of such networks greatly drops when they are exposed to low size and low contrast images which have been captured by passive monitoring systems. On the other hand factors such as sea waves, c...

متن کامل

A High Speed Multi-label Classifier based on Extreme Learning Machines

In this paper a high speed neural network classifier based on extreme learning machines for multi-label classification problem is proposed and discussed. Multi-label classification is a superset of traditional binary and multiclass classification problems. The proposed work extends the extreme learning machine technique to adapt to the multi-label problems. As opposed to the singlelabel problem...

متن کامل

Learning Deep Latent Space for Multi-Label Classification

Multi-label classification is a practical yet challenging task in machine learning related fields, since it requires the prediction of more than one label category for each input instance. We propose a novel deep neural networks (DNN) based model, Canonical Correlated AutoEncoder (C2AE), for solving this task. Aiming at better relating feature and label domain data for improved classification, ...

متن کامل